{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "023d771c",
   "metadata": {
    "id": "023d771c"
   },
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/quick-tour/hello-pinecone.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/quick-tour/hello-pinecone.ipynb)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "conceptual-belfast",
   "metadata": {
    "editable": true,
    "id": "conceptual-belfast",
    "papermill": {
     "duration": 0.028714,
     "end_time": "2021-04-16T15:08:30.639231",
     "exception": false,
     "start_time": "2021-04-16T15:08:30.610517",
     "status": "completed"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "# Hello, Pinecone!\n",
    "\n",
    "This notebook will walk through the steps to get a simple Pinecone index up and running.\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "first-affairs",
   "metadata": {
    "id": "first-affairs",
    "papermill": {
     "duration": 0.025469,
     "end_time": "2021-04-16T15:08:30.690710",
     "exception": false,
     "start_time": "2021-04-16T15:08:30.665241",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## Prerequisites"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "banned-huntington",
   "metadata": {
    "editable": true,
    "id": "banned-huntington",
    "papermill": {
     "duration": 0.023078,
     "end_time": "2021-04-16T15:08:30.737798",
     "exception": false,
     "start_time": "2021-04-16T15:08:30.714720",
     "status": "completed"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "First we need to install a few dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "parallel-detective",
   "metadata": {
    "id": "parallel-detective",
    "papermill": {
     "duration": 18.119125,
     "end_time": "2021-04-16T15:08:48.881643",
     "exception": false,
     "start_time": "2021-04-16T15:08:30.762518",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install -qU pandas==2.2.3 pinecone==6.0.2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "272f3b6d",
   "metadata": {},
   "source": [
    "## Getting started\n",
    "\n",
    "We begin by instantiating an instance of the Pinecone client. To do this we need a [free API key](https://app.pinecone.io)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3f53fd40",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "from pinecone import Pinecone\n",
    "\n",
    "# Get your API key at app.pinecone.io\n",
    "api_key = os.environ.get(\"PINECONE_API_KEY\") or \"PINECONE_API_KEY\"\n",
    "\n",
    "# Instantiate the Pinecone client\n",
    "pc = Pinecone(api_key=api_key)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "forbidden-indication",
   "metadata": {
    "id": "forbidden-indication",
    "papermill": {
     "duration": 0.020515,
     "end_time": "2021-04-16T15:08:49.925687",
     "exception": false,
     "start_time": "2021-04-16T15:08:49.905172",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Pinecone quickstart\n",
    "\n",
    "With Pinecone you can create a vector index where you can easily store and search through your vector embeddings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "EA2EcZsCoWS3",
   "metadata": {
    "id": "EA2EcZsCoWS3",
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "# Giving our index a name\n",
    "index_name = \"hello-pinecone\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "synthetic-essex",
   "metadata": {
    "id": "synthetic-essex",
    "papermill": {
     "duration": 0.446682,
     "end_time": "2021-04-16T15:08:50.393195",
     "exception": false,
     "start_time": "2021-04-16T15:08:49.946513",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Delete the index if an index of the same name already exists\n",
    "if pc.has_index(name=index_name):\n",
    "    pc.delete_index(name=index_name)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "94LRI2H8Ch2B",
   "metadata": {
    "editable": true,
    "id": "94LRI2H8Ch2B",
    "papermill": {
     "duration": 0.021764,
     "end_time": "2021-04-16T15:08:50.446400",
     "exception": false,
     "start_time": "2021-04-16T15:08:50.424636",
     "status": "completed"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "### Creating a Pinecone Index\n",
    "\n",
    "When creating the index we need to define several configuration properties. \n",
    "\n",
    "- `name` can be anything we like. The name is used as an identifier for the index when performing other operations such as `describe_index`, `delete_index`, and so on. \n",
    "- `metric` specifies the similarity metric that will be used later when you make queries to the index.\n",
    "- `dimension` should correspond to the dimension of the dense vectors produced by your embedding model. In this quick start, we are using made-up data so a small value is simplest.\n",
    "- `spec` holds a specification which tells Pinecone how you would like to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/troubleshooting/available-cloud-regions).\n",
    "\n",
    "There are more configurations available, but this minimal set will get us started."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4YwC8livCrn2",
   "metadata": {
    "id": "4YwC8livCrn2",
    "papermill": {
     "duration": 13.756687,
     "end_time": "2021-04-16T15:09:04.224466",
     "exception": false,
     "start_time": "2021-04-16T15:08:50.467779",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\n",
       "    \"name\": \"hello-pinecone\",\n",
       "    \"metric\": \"cosine\",\n",
       "    \"host\": \"hello-pinecone-dojoi3u.svc.aped-4627-b74a.pinecone.io\",\n",
       "    \"spec\": {\n",
       "        \"serverless\": {\n",
       "            \"cloud\": \"aws\",\n",
       "            \"region\": \"us-east-1\"\n",
       "        }\n",
       "    },\n",
       "    \"status\": {\n",
       "        \"ready\": true,\n",
       "        \"state\": \"Ready\"\n",
       "    },\n",
       "    \"vector_type\": \"dense\",\n",
       "    \"dimension\": 3,\n",
       "    \"deletion_protection\": \"disabled\",\n",
       "    \"tags\": null\n",
       "}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pinecone import ServerlessSpec, CloudProvider, AwsRegion, Metric\n",
    "\n",
    "pc.create_index(\n",
    "    name=index_name,\n",
    "    metric=Metric.COSINE,\n",
    "    dimension=3,\n",
    "    spec=ServerlessSpec(cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "060bb093-fc60-4065-bb6f-529145e6f186",
   "metadata": {},
   "source": [
    "We can look up the configuration for the index anytime we like by using `describe_index`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "eeb2f680-7250-4117-bdbb-b19fc36d476b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\n",
       "    \"name\": \"hello-pinecone\",\n",
       "    \"metric\": \"cosine\",\n",
       "    \"host\": \"hello-pinecone-dojoi3u.svc.aped-4627-b74a.pinecone.io\",\n",
       "    \"spec\": {\n",
       "        \"serverless\": {\n",
       "            \"cloud\": \"aws\",\n",
       "            \"region\": \"us-east-1\"\n",
       "        }\n",
       "    },\n",
       "    \"status\": {\n",
       "        \"ready\": true,\n",
       "        \"state\": \"Ready\"\n",
       "    },\n",
       "    \"vector_type\": \"dense\",\n",
       "    \"dimension\": 3,\n",
       "    \"deletion_protection\": \"disabled\",\n",
       "    \"tags\": null\n",
       "}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "description = pc.describe_index(name=index_name)\n",
    "description"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "j1F8SLx6C2HH",
   "metadata": {
    "id": "j1F8SLx6C2HH",
    "papermill": {
     "duration": 0.023037,
     "end_time": "2021-04-16T15:09:05.153116",
     "exception": false,
     "start_time": "2021-04-16T15:09:05.130079",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## Upserting data into the index\n",
    "\n",
    "We can see the index ready. Now we will create some simple vectors that will serve as our examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "toy-VhU4LO_O",
   "metadata": {
    "id": "toy-VhU4LO_O",
    "papermill": {
     "duration": 0.846255,
     "end_time": "2021-04-16T15:09:05.097384",
     "exception": false,
     "start_time": "2021-04-16T15:09:04.251129",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Instantiate an Index client\n",
    "index = pc.Index(host=description.host)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "indirect-lafayette",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 112
    },
    "id": "indirect-lafayette",
    "outputId": "5bd49b0e-0187-4de2-e564-1d41c61b7bc9",
    "papermill": {
     "duration": 0.227373,
     "end_time": "2021-04-16T15:09:05.404700",
     "exception": false,
     "start_time": "2021-04-16T15:09:05.177327",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>vector</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>id-0</td>\n",
       "      <td>[0.19415077620011345, 0.12315138914527213, 0.9...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>id-1</td>\n",
       "      <td>[0.8274728097660323, 0.8350750339818135, 0.961...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>id-2</td>\n",
       "      <td>[0.9630530808168708, 0.46559222532176947, 0.04...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>id-3</td>\n",
       "      <td>[0.5879443851815274, 0.5590457108385455, 0.924...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>id-4</td>\n",
       "      <td>[0.6104298712136548, 0.2665978264705289, 0.858...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     id                                             vector\n",
       "0  id-0  [0.19415077620011345, 0.12315138914527213, 0.9...\n",
       "1  id-1  [0.8274728097660323, 0.8350750339818135, 0.961...\n",
       "2  id-2  [0.9630530808168708, 0.46559222532176947, 0.04...\n",
       "3  id-3  [0.5879443851815274, 0.5590457108385455, 0.924...\n",
       "4  id-4  [0.6104298712136548, 0.2665978264705289, 0.858..."
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import random\n",
    "import pandas as pd\n",
    "\n",
    "\n",
    "def create_simulated_data_in_df(num_vectors):\n",
    "    df = pd.DataFrame(\n",
    "        data={\n",
    "            \"id\": [f\"id-{i}\" for i in range(num_vectors)],\n",
    "            \"vector\": [\n",
    "                [random.random() for i in range(description.dimension)]\n",
    "                for _ in range(num_vectors)\n",
    "            ],\n",
    "        }\n",
    "    )\n",
    "    return df\n",
    "\n",
    "\n",
    "df = create_simulated_data_in_df(10)\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "oiJKXWxoDjhK",
   "metadata": {
    "id": "oiJKXWxoDjhK",
    "papermill": {
     "duration": 0.022968,
     "end_time": "2021-04-16T15:09:05.452054",
     "exception": false,
     "start_time": "2021-04-16T15:09:05.429086",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We perform `upsert` operations in our index. This call will insert a new vector in the index or update the vector if the id was already present."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "efficient-parliament",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "efficient-parliament",
    "outputId": "0d9fbac4-4f8a-421e-95a9-0f441d2dcc16",
    "papermill": {
     "duration": 0.704503,
     "end_time": "2021-04-16T15:09:06.180549",
     "exception": false,
     "start_time": "2021-04-16T15:09:05.476046",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'upserted_count': 10}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "index.upsert(vectors=zip(df.id, df.vector))  # insert vectors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "enclosed-performer",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "enclosed-performer",
    "outputId": "5b67ec13-6863-4b1a-ac45-b57c569923ee",
    "papermill": {
     "duration": 0.140473,
     "end_time": "2021-04-16T15:09:06.352169",
     "exception": false,
     "start_time": "2021-04-16T15:09:06.211696",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Vector count:  0\n",
      "Vector count:  0\n",
      "Vector count:  0\n",
      "Vector count:  0\n",
      "Vector count:  0\n",
      "Vector count:  0\n",
      "Vector count:  0\n",
      "Vector count:  0\n",
      "Vector count:  0\n",
      "Vector count:  10\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "\n",
    "def is_fresh(index):\n",
    "    stats = index.describe_index_stats()\n",
    "    vector_count = stats.total_vector_count\n",
    "    print(f\"Vector count: \", vector_count)\n",
    "    return vector_count > 0\n",
    "\n",
    "\n",
    "while not is_fresh(index):\n",
    "    # It takes a few moments for vectors we just upserted\n",
    "    # to become available for querying\n",
    "    time.sleep(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "1105937a-e37b-4110-86a8-58ca1bcf9797",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "enclosed-performer",
    "outputId": "5b67ec13-6863-4b1a-ac45-b57c569923ee",
    "papermill": {
     "duration": 0.140473,
     "end_time": "2021-04-16T15:09:06.352169",
     "exception": false,
     "start_time": "2021-04-16T15:09:06.211696",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'dimension': 3,\n",
       " 'index_fullness': 0.0,\n",
       " 'metric': 'cosine',\n",
       " 'namespaces': {'': {'vector_count': 10}},\n",
       " 'total_vector_count': 10,\n",
       " 'vector_type': 'dense'}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# View index stats\n",
    "index.describe_index_stats()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d4b3875-7525-4bc6-835e-8509e46f2c50",
   "metadata": {},
   "source": [
    "## Running a query\n",
    "\n",
    "Next we can run a query.\n",
    "\n",
    "In a more realistic scenario, the `vector` values passing into `query` would be an embedding vector of something meaningful. But for this simple walkthrough we will use made up values. The query will succeed as long as the dimension matches the dimension of our index.\n",
    "\n",
    "`top_k` specifies the number of results we would like returned. The method will return up to `top_k` results, but may be less if there are fewer than `top_k` vectors in your index or if all indexes have been filtered out using metadata filters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "leading-shape",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "leading-shape",
    "outputId": "fb512e95-ebf4-4e1d-b9c7-74afc3cdd0c2",
    "papermill": {
     "duration": 2.177493,
     "end_time": "2021-04-16T15:09:08.564594",
     "exception": false,
     "start_time": "2021-04-16T15:09:06.387101",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'matches': [{'id': 'id-1',\n",
       "              'score': 0.997445405,\n",
       "              'values': [0.827472806, 0.835075, 0.961279154]},\n",
       "             {'id': 'id-3',\n",
       "              'score': 0.972650886,\n",
       "              'values': [0.587944388, 0.559045732, 0.924576044]},\n",
       "             {'id': 'id-9',\n",
       "              'score': 0.958372176,\n",
       "              'values': [0.323696256, 0.569944143, 0.704567373]},\n",
       "             {'id': 'id-4',\n",
       "              'score': 0.921065927,\n",
       "              'values': [0.610429883, 0.266597837, 0.858841419]},\n",
       "             {'id': 'id-7',\n",
       "              'score': 0.906878114,\n",
       "              'values': [0.2434071, 0.738737464, 0.967846]}],\n",
       " 'namespace': '',\n",
       " 'usage': {'read_units': 1}}"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# In a more realistic scenario, this would be an embedding vector\n",
    "# that encodes something meaningful. For this simple demo, we will\n",
    "# make up a vector that matches the dimension of our index.\n",
    "query_embedding = [2.0, 2.0, 2.0]\n",
    "\n",
    "index.query(vector=query_embedding, top_k=5, include_values=True)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "z5jcU5_SLMFC",
   "metadata": {
    "id": "z5jcU5_SLMFC",
    "papermill": {
     "duration": 0.035627,
     "end_time": "2021-04-16T15:09:08.629172",
     "exception": false,
     "start_time": "2021-04-16T15:09:08.593545",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## Delete the Index\n",
    "Delete the index once you are sure that you do not want to use it anymore. **Deletion is permanent**. Once the index is deleted, you cannot use it again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "indian-broadcast",
   "metadata": {
    "id": "indian-broadcast",
    "papermill": {
     "duration": 12.505772,
     "end_time": "2021-04-16T15:09:21.171527",
     "exception": false,
     "start_time": "2021-04-16T15:09:08.665755",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "pc.delete_index(name=index_name)"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  },
  "papermill": {
   "default_parameters": {},
   "duration": 52.201461,
   "end_time": "2021-04-16T15:09:21.730976",
   "environment_variables": {},
   "exception": null,
   "input_path": "/notebooks/quick_tour/hello_pinecone.ipynb",
   "output_path": "/notebooks/tmp/quick_tour/hello_pinecone.ipynb",
   "parameters": {},
   "start_time": "2021-04-16T15:08:29.529515",
   "version": "2.3.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
